The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Summary. Random‐graph mixture models are very popular for modelling real data networks. Parameter estimation procedures usually rely on variational approximations, either combined with the expectation–maximization (EM) algorithm or with Bayesian approaches. Despite good results on synthetic data, the validity of the variational approximation is, however, not established. Moreover, these variational...
Summary. Motivated by the conditional growth charts problem, we develop a method for conditional quantile analysis when predictors take values in a functional space. The method proposed aims at estimating conditional distribution functions under a generalized functional regression framework. This approach facilitates balancing of model flexibility and the curse of dimensionality for the infinite...
Summary. Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such models typically require O(n3) operations for a data set of size n. Various approximations of the covariance functions have been introduced to reduce the computational cost. However, most existing approximations...
Summary. We consider the general problem of constructing confidence regions for, possibly multi‐dimensional, parameters when we have available more than one approach for the construction. These approaches may be motivated by different model assumptions, different levels of approximation, different settings of tuning parameters or different Monte Carlo algorithms. Their effectiveness is often governed...
Summary. Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional...
Summary. Many methods for estimation or control of the false discovery rate (FDR) can be improved by incorporating information about π0, the proportion of all tested null hypotheses that are true. Estimates of π0 are often based on the number of p‐values that exceed a threshold λ. We first give a finite sample proof for conservative point estimation of the FDR when the λ‐parameter is fixed. Then...
Summary. A general methodology is introduced for the construction and effective application of control variates to estimation problems involving data from reversible Markov chain Monte Carlo samplers. We propose the use of a specific class of functions as control variates, and we introduce a new consistent estimator for the values of the coefficients of the optimal linear combination of these functions...
Summary. We use Lévy processes to generate joint prior distributions, and therefore penalty functions, for a location parameter as p grows large. This generalizes the class of local–global shrinkage rules based on scale mixtures of normals, illuminates new connections between disparate methods and leads to new results for computing posterior means and modes under a wide class of priors. We extend...
Summary. We propose a new ‘fast subset scan’ approach for accurate and computationally efficient event detection in massive data sets. We treat event detection as a search over subsets of data records, finding the subset which maximizes some score function. We prove that many commonly used functions (e.g. Kulldorff's spatial scan statistic and extensions) satisfy the ‘linear time subset scanning’...
Summary. Association models, like frailty and copula models, are frequently used to analyse clustered survival data and to evaluate within‐cluster associations. The assumption of non‐informative censoring is commonly applied to these models, though it may not be true in many situations. We consider bivariate competing risk data and focus on association models specified for the bivariate cumulative...
Summary. A sufficient cause interaction between two exposures signals the presence of individuals for whom the outcome would occur only under certain values of the two exposures. When the outcome is dichotomous and all exposures are categorical, then, under certain no confounding assumptions, empirical conditions for sufficient cause interactions can be constructed on the basis of the sign of linear...
Summary. We show that, in functional data classification problems, perfect asymptotic classification is often possible, making use of the intrinsic very high dimensional nature of functional data. This performance is often achieved by linear methods, which are optimal in important cases. These results point to a marked contrast between classification for functional data and its counterpart in conventional...
Summary. For a reduced rank multivariate stochastic regression model of rank r*, the regression coefficient matrix can be expressed as a sum of r* unit rank matrices each of which is proportional to the outer product of the left and right singular vectors. For improving predictive accuracy and facilitating interpretation, it is often desirable that these left and right singular vectors be sparse...
Summary. The ‘expectation–conditional maximization either’ (ECME) algorithm has proven to be an effective way of accelerating the expectation–maximization algorithm for many problems. Recognizing the limitation of using prefixed acceleration subspaces in the ECME algorithm, we propose a dynamic ECME (DECME) algorithm which allows the acceleration subspaces to be chosen dynamically. The simplest DECME...
Summary. We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have proposed ‘SAFE’ rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered...
Summary. It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics‐based distance between two communities, is one of the most commonly used tools for these analyses...
Summary. For extreme value modelling based on threshold techniques, a well‐documented issue is the sensitivity of inference from the model to the choice of threshold. The threshold above which we assume a non‐homogeneous Poisson process, or equivalently generalized Pareto representation, to be a reasonable approximation to the distribution is traditionally selected before analysis and subsequently...
Summary. We address the problem of providing inference from a Bayesian perspective for parameters selected after viewing the data. We present a Bayesian framework for providing inference for selected parameters, based on the observation that providing Bayesian inference for selected parameters is a truncated data problem. We show that if the prior for the parameter is non‐informative, or if the parameter...
Summary. The paper investigates the estimation problem in a regression‐type model. To be able to deal with potential high dimensions, we provide a procedure called LOL—for learning out of leaders—with no optimization step. LOL is an autodriven algorithm with two thresholding steps. A first adaptive thresholding helps to select leaders among the initial regressors to obtain a first reduction of dimensionality...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.